Goto

Collaborating Authors

 similar word


From Ghazals to Sonnets: Decoding the Polysemous Expressions of Love Across Languages

arXiv.org Artificial Intelligence

This paper delves into the intricate world of Urdu poetry, exploring its thematic depths through a lens of polysemy. By focusing on the nuanced differences between three seemingly synonymous words (pyaar, muhabbat, and ishq) we expose a spectrum of emotions and experiences unique to the Urdu language. This study employs a polysemic case study approach, meticulously examining how these words are interwoven within the rich tapestry of Urdu poetry. By analyzing their usage and context, we uncover a hidden layer of meaning, revealing subtle distinctions which lack direct equivalents in English literature. Furthermore, we embark on a comparative analysis, generating word embeddings for both Urdu and English terms related to love. This enables us to quantify and visualize the semantic space occupied by these words, providing valuable insights into the cultural and linguistic nuances of expressing love. Through this multifaceted approach, our study sheds light on the captivating complexities of Urdu poetry, offering a deeper understanding and appreciation for its unique portrayal of love and its myriad expressions


A Quantitative Discourse Analysis of Asian Workers in the US Historical Newspapers

arXiv.org Artificial Intelligence

Warning: This paper contains examples of offensive language targetting marginalized population. The digitization of historical texts invites researchers to explore the large-scale corpus of historical texts with computational methods. In this study, we present computational text analysis on a relatively understudied topic of how Asian workers are represented in historical newspapers in the United States. We found that the word "coolie" was semantically different in some States (e.g., Massachusetts, Rhode Island, Wyoming, Oklahoma, and Arkansas) with the different discourses around coolie. We also found that then-Confederate newspapers and then-Union newspapers formed distinctive discourses by measuring over-represented words. Newspapers from then-Confederate States associated coolie with slavery-related words. In addition, we found Asians were perceived to be inferior to European immigrants and subjected to the target of racism. This study contributes to supplementing the qualitative analysis of racism in the United States with quantitative discourse analysis.


The flow of ideas in word embeddings

arXiv.org Artificial Intelligence

The flow of ideas has been extensively studied by physicists, psychologists, and machine learning engineers. This paper adopts specific tools from microrheology to investigate the similarity-based flow of ideas. We introduce a random walker in word embeddings and study its behavior. Such similarity-mediated random walks through the embedding space show signatures of anomalous diffusion commonly observed in complex structured systems such as biological cells and complex fluids. The paper concludes by proposing the application of popular tools employed in the study of random walks and diffusion of particles under Brownian motion to assess quantitatively the incorporation of diverse ideas in a document. Overall, this paper presents a self-referenced method combining microrheology and machine learning concepts to explore the meandering tendencies of language models and their potential association with creativity.


New Developments in the field of Computation and Language part2(Artificial Intelligence)

#artificialintelligence

Abstract: We present the first openly available multi- modal metaphor annotated corpus. The corpus consists of videos including audio and subti- tles that have been annotated by experts. Fur- thermore, we present a method for detecting metaphors in the new dataset based on the textual content of the videos. The method achieves a high F1-score (62%) for metaphor- ical labels. We also experiment with other modalities and multimodal methods; however, these methods did not out-perform the text- based model.


Synonym Detection Using Syntactic Dependency And Neural Embeddings

arXiv.org Artificial Intelligence

Recent advances on the Vector Space Model have significantly improved some NLP applications such as neural machine translation and natural language generation. Although word co-occurrences in context have been widely used in counting-/predicting-based distributional models, the role of syntactic dependencies in deriving distributional semantics has not yet been thoroughly investigated. By comparing various Vector Space Models in detecting synonyms in TOEFL, we systematically study the salience of syntactic dependencies in accounting for distributional similarity. We separate syntactic dependencies into different groups according to their various grammatical roles and then use context-counting to construct their corresponding raw and SVD-compressed matrices. Moreover, using the same training hyperparameters and corpora, we study typical neural embeddings in the evaluation. We further study the effectiveness of injecting human-compiled semantic knowledge into neural embeddings on computing distributional similarity. Our results show that the syntactically conditioned contexts can interpret lexical semantics better than the unconditioned ones, whereas retrofitting neural embeddings with semantic knowledge can significantly improve synonym detection.


emojiSpace: Spatial Representation of Emojis

arXiv.org Artificial Intelligence

In the absence of nonverbal cues during messaging communication, users express part of their emotions using emojis. Thus, having emojis in the vocabulary of text messaging language models can significantly improve many natural language processing (NLP) applications such as online communication analysis. On the other hand, word embedding models are usually trained on a very large corpus of text such as Wikipedia or Google News datasets that include very few samples with emojis. In this study, we create emojiSpace, which is a combined word-emoji embedding using the word2vec model from the Genism library in Python. We trained emojiSpace on a corpus of more than 4 billion tweets and evaluated it by implementing sentiment analysis on a Twitter dataset containing more than 67 million tweets as an extrinsic task. For this task, we compared the performance of two different classifiers of random forest (RF) and linear support vector machine (SVM). For evaluation, we compared emojiSpace performance with two other pre-trained embeddings and demonstrated that emojiSpace outperforms both.


Word2vec with PyTorch: Implementing the Original Paper

#artificialintelligence

Word Embeddings is the most fundamental concept in Deep Natural Language Processing. And word2vec is one of the earliest algorithms used to train word embeddings. In this post, I want to go deeper into the first paper on word2vec -- Efficient Estimation of Word Representations in Vector Space (2013), which as of now has 24k citations, and this number is still growing. I am attaching my Github project with word2vec training. We will go through it in this post.


Autocorrect Feature using NLP in Python

#artificialintelligence

This article was published as a part of the Data Science Blogathon. Natural Language Processing (NLP) is the field of artificial intelligence that relates lingual to Computer Science. I am assuming that you have understood the basic concepts of NLP. So we will move ahead. Have you ever wondered about how the Autocorrect features work on the keyboard of a Smartphone?


Word2vec with PyTorch: Implementing the Original Paper

#artificialintelligence

Word Embeddings is the most fundamental concept in Deep Natural Language Processing. And word2vec is one of the earliest algorithms used to train word embeddings. In this post, I want to go deeper into the first paper on word2vec -- Efficient Estimation of Word Representations in Vector Space (2013), which as of now has 24k citations, and this number is still growing. I am attaching my Github project with word2vec training. We will go through it in this post.


Word Embeddings in High-Level

#artificialintelligence

The most common representation of words in NLP tasks is the One Hot Encoding. Below we can see an example of One Hot Encoding for the words "Cat" and "Dog". As we can see, these two vectors are independent since their inner product is 0, and their Euclidean distance is the square root of 2. Notice that this applies to every pair in the vocabulary, meaning that every pair of words are independent, and their distance is the square root of 2. Notice that this applies to every pair in the vocabulary, meaning that every pair of words are independent, and their distance is \(\sqrt(2)\). For example, the words below are considered independent, and the distance -- similarity between any pair of words is the same. This is an issue for NLP tasks since we want to be able to capture the relation between words.